perm filename SPEECH.DOC[D,LES] blob
sn#056013 filedate 1973-07-30 generic text, type T, neo UTF8
COMMENT ⊗ VALID 00010 PAGES
RECORD PAGE DESCRIPTION
00001 00001
00002 00002 %,2
00004 00003 ABSTRACT
00006 00004 %B,2,,,1
00022 00005 2. FACILITIES
00024 00006 %B,1
00027 00007 %B,2
00043 00008 B. SPEECH RESEARCH AT STANFORD UNIVERSITY
00054 00009 %B,1
00059 00010 D. COGNIZANT PERSONNEL
00061 ENDMK
⊗;
Proposal for Speech Understanding Research
Arthur Samuel, Senior Research Associate
Principal Investigator
Submitted to the
Advanced Research Projects Agency
July 1973
Computer Science Department
School of Humanities and Sciences
Stanford University
ABSTRACT
A two year research effort, beginning in October 1973, is proposed on
the use of automatic training methods to adapt speech understanding
systems to the characteristics of the speaker. This would involve a
small staff of people with special capabilities in this field and it
would require a budget of $226,266. No new facilities would be
required.
TABLE OF CONTENTS
Section Page
1. Proposal . . . . . . . . . . . . . . . . . . . . . . . 1
2. Facilities . . . . . . . . . . . . . . . . . . . . . . 8
3. Budget . . . . . . . . . . . . . . . . . . . . . . . . 10
Appendix
A. Initial Form of Signature Table for Speech Recognition 12
B. Speech Research at Stanford University . . . . . . . . 19
C. Bibliography . . . . . . . . . . . . . . . . . . . . . 24
D. Cognizant Personnel . . . . . . . . . . . . . . . . . 26
1. A PROPOSAL FOR SPEECH UNDERSTANDING RESEARCH
It is proposed that the work on speech recognition that is
now under way in the A.I. project at Stanford University be continued
and extended with broadened aims in the field of speech
understanding. This work gives considerable promise both of solving
some of the immediate problems that beset speech understanding
research and of providing a basis for future advances.
It is further proposed that this work be more closely tied to
the ARPA Speech Understanding Research effort than it has been in the
past and that it have as its express aim the study and application to
speech recognition of a machine learning process, that has proved
highly successful in another application and that has already been
tested out to a limited extent in speech recognition. The machine
learning process offers both an automatic training scheme and the
inherent ability of the system to adapt to various speakers and
dialects. Speech recognition via machine learning represents a global
approach to the speech recognition problem and can be incorporated
into a wide class of limited vocabulary systems.
Finally we would propose accepting responsibility for keeping
other ARPA projects supplied with operating versions of the best
current programs that we have developed. The availability of the high
quality front end that the signature table approach provides would
1
enable designers of the various over-all systems to test the relative
performance of the top-down portions of their systems without having
to make allowances for the deficiencies of their currently available
front ends. Indeed, if the signature table scheme can be made simple
enough to compete on a time basis (and we believe that it can) then
it may replace the other front end schemes that are currently in
favor.
Stanford University is well suited as the site for such work,
having both the facilities for this work and a staff of people with
experience and interest in machine learning, phonetic analysis, and
digital signal processing. The staff at present consists of the
proposed Principal Investigator Arthur L. Samuel and Dr. Neil Miller,
who has had considerable experience in the analysis and synthesis of
human voice signals using digital processes. An additional research
associate together with a few graduate students would complete the
team. It is anticipated that this staff of not more than 3 full time
members with the help of 2 or 3 graduate students could mount a
meaningful program, which should be funded for a mimimum of two years
to ensure continuity of effort. We would expect to demonstrate the
utility of the Signature Table approach within this time span and to
provide a working system that could be used as the front end for any
of the speech understanding systems that are currently under
development or are being planned.
2
Ultimately we would like to have a system capable of
understanding speech from an unlimited domain of discourse and with
an unknown speaker. It seems not unreasonable to expect the system to
deal with this situation very much as people do when they adapt their
understanding processes to the speakers idiosyncrasies during the
conversation. The signature table method gives promise of
contributing toward the solution of this problem as well as being a
possible answer to some of the more immediate problems.
The initial thrust of the proposed work would be toward the
development of adaptive learning techniques, using the signature
table method and some more recent variants and extensions of this
basic procedure. We have already demonstrated the usefulness of this
method for the initial assignment of significant features to the
acoustic signals. One of the next steps will be to extend the method
to include acoustic-phonetic probabilities in the decision process.
Still another aspect to be studied would be the amount of
prerocessing that should be done and the desired balance between
bottom-up and top-down approaches. It is fairly obvious that
decisions of this sort should ideally be made dynamically depending
upon the familiarity of the system with the domain of discourse and
with the characteristics of the speaker. Compromises will
undoubtedly have to be made in any immediately realizable system but
we should understand better than we now do the limitations on the
system that such compromises impose.
3
It may be well at this point to describe the general
philosophy that has been followed in the work that is currently under
way and the results that have been achieved to date. We have been
studying elements of a speech recognition system that are not
dependent upon the use of a limited vocabulary and that can recognize
continuous speech by a number of different speakers.
Such a system should be able to function successfully either
without any previous training for the specific speaker in question or
after a short training session in which the speaker would be asked to
repeat certain phrases designed to train the system on those phonetic
utterances that seemed to depart from the previously learned norm. In
either case, it is believed that some automatic or semi-automatic
training system should be employed to acquire the data that is used
for the identification of the phonetic information in the speech. We
believe that this can best be done by employing a modification of the
signature table scheme previously described. A brief review of this
earlier form of signature table is given in reference 17.
The over-all system is envisioned as one in which the more
or less conventional method is used of separating the input speech
into short time slices for which some sort of frequency analysis,
homomorphic, linear predictive coding, or the like, is done. We
then interpret this information in terms of significant features by
means of a set of signature tables. At this point we define longer
sections of the speech called segments which are obtained by
4
grouping together varying numbers of the original slices on the
basis of their similarity. This then takes the place of other forms
of initial segmentation. Having identified a series of segments in
this way we next use another set of signature tables to extract
information from the sequence of segments and combine it with a
limited amount of syntactic and semantic information to define a
sequence of phonemes.
While it would be possible to extend this bottom up approach
still further, it seems reasonable to break off at this point and
revert to a top down approach from here on. The real difference in
the overall system would then be that the top down analysis would
deal with the outputs from the signature table section as its
primitives rather than with the outputs from the initial measurements
either in the time domain or in the frequency domain. In the case of
inconsistencies the system could either refer to the second choices
retained within the signature tables or if need be could always go
clear back to the input parameters. The decision as to how far to
carry the initial bottom up analysis must depend upon the relative
cost of this analysis both in complexity and processing time and the
certainty with which it can be performed as compared with the costs
associated with the rest of the analysis and the certainty with which
that can be performed, taking due notice of the costs in time of
recovering from false starts.
5
Signature tables can be used to perform four essential
functions that are required in the automatic recognition of speech.
These functions are: (1) the elimination of superfluous and redundant
information from the acoustic input stream, (2) the transformation of
the remaining information from one coordinate system to a more
phonetically meaningful coordinate system, (3) the mixing of
acoustically derived data with syntactic, semantic and linguistic
information to obtain the desired recognition, and (4) the
introduction of a learning mechanism.
The following three advantages emerge from this method of
training and evaluation.
1) Essentially arbitrary inter-relationships between the
input terms are taken in account by any one table. The only loss of
accuracy is in the quantization.
2) The training is a very simple process of accumulating
counts. The training samples are introduced sequentially, and hence
simultaneous storage of all the samples is not required.
3) The process linearizes the storage requirements in the
parameter space.
The signature tables, as used in speech recognition, must be
particularized to allow for the multi-category nature of the output.
Several forms of tables have been investigated. An overview of the
current system is given in Appendix 1. For some early results see SUR
Note 43 "Some Preliminary Experiments in Speech Recognition Using
6
Signature Tables" by R.B.Thosar and A.L.Samuel [20].
Work is currently under way on a major refinement of the
signature table approach which adopts a somewhat more rigorous
procedure. Preliminary results with this scheme indicate that a
substantial improvement has been achieved. This effort is described
in a recent report SUR Note 81 on "Estimation of Probability Density
Using Signature Tables for Application to Pattern Recognition, by
R.B.Thosar [21].
We are currently involved in work on a segmentation procedure
which has already demonstrated its ability to compete with other
proposed segmentation systems, even when used to process speech from
speakers whose utterances were not used during the training sequence.
7
2. FACILITIES
The computer facilities of the Stanford Artificial Intelligence
Laboratory include the following equipment.
Central Processors: Digital Equipment Corporation PDP-10 and PDP-6
Primary Store: 65K words of 1.7 microsecond DEC Core
65K words of 1 microsecond Ampex Core
131K words of 1.6 microsecond Ampex Core
Swapping Store: Librascope disk (5 million words, 22 million
bits/second transfer rate)
File Store: IBM 3334 disc file, 6 spindles (leased)
Peripherals: 4 DECtape drives, 2 mag tape drives, line printer,
Calcomp plotter, Xerox Graphics Printer
Communications
Processor: BBN IMP (Honeywell DDP-516) connected to the
ARPA network.
Terminals: 58 TV displays, 6 III displays, 3 IMLAC displays,
1 ARDS display, 15 Teletype terminals
8
Special Equipment: Audio input and output systems, hand-eye
equipment (2 TV cameras, 3 arms), remote-
controlled cart
Existing and planned facilities will be adequate to support this
proposal, hence no additional facilities are budgeted.
9
3. BUDGET
Two years beginning October 1, 1973
BUDGET CATEGORY YEAR 1 YEAR 2
-----------------------------------------------------------------
I. SALARIES & WAGES:
Samuel, A.L.,
Senior Research Associate
Principal Investigator, 75% 20,000 20,000
------,
Research Associate 14,520 14,520
Miller, N.J.,
Research Associate 13,680 13,680
------,
Student Research Assistant,
50% academic year, 100% summer 4,914 5,070
------,
Student Research Assistant,
50% academic year, 100% summer 4,914 5,070
Reserve for Salary Increases
@ 5.5% per year 3,192 6,592
------- -------
TOTAL SALARIES AND WAGES $61,220 $64,932
II. STAFF BENEFITS:
17.0% 10-1-73 to 8-31-74 9,540
18.3% 9-1-74 to 8-31-75 934 10,894
19.3% 9-1-75 to 9-30-75 1,042
------- -------
TOTAL STAFF BENEFITS $10,474 $11,936
III. TRAVEL:
Domestic -
Local 150
East Coast 450
---
$600 $600
10
IV. EXPENDABLE MATERIALS & SERVICES:
A. Telephone Service 480
B. Office Supplies 600
---
$1,080 $1,080
V. PUBLICATIONS COST:
2 Papers @ 500 ea. $1,000 $1,000
------- -------
VI. TOTAL DIRECT COSTS:
(Items I through V) $74,374 $79,548
VII. INDIRECT COSTS:
On Campus - 47% of NTDC $34,956 $37,388
------- -------
VIII. TOTAL COSTS:
(Items VI + VII) $109,330 $116,936
-------- --------
11
APPENDIX A. INITIAL FORM OF SIGNATURE TABLE FOR SPEECH RECOGNITION
The signature tables, as used in speech recognition, must be
particularized to allow for the multi-category nature of the output.
Several forms of tables have been investigated. The initial form
tested and used for the data presented in the attached paper uses
tables consisting of two parts, a preamble and the table proper. The
preamble contains: (1) space for saving a record of the current and
recent output reports from the table, (2) identifying information as
to the specific type of table, (3) a parameter that identifies the
desired output from the table and that is used in the learning
process, (4) a gating parameter specifying the input, that is to be
used to gate the table, (5) the sign of the gate, (6) the gating
level to be used and (7) parameters that identify the sources of the
normal inputs to the table.
All inputs are limited in range and specify either the
absolute level of some basic property or more usually the probability
of some property being present. These inputs may be from the original
acoustic input or they may be the outputs of other tables. If from
other tables they may be for the current time step or for earlier
time steps, (subject to practical limits as to the number of time
steps that are saved).
12
The output, or outputs, from each table are similarly limited
in range and specify, in all cases, a probability that some
particular significant feature, phonette, phoneme, word segment, word
or phrase is present.
We are limiting the range of inputs and outputs to values
specified by 3 bits and the number of entries per table to 64
although this choice of values is a matter to be determined by
experiment. We are also providing for any of the following input
combinations, (1) one input of 6 bits, (2) two inputs of 3 bits each,
(3) three inputs of 2 bits each, and (4) six inputs of 1 bit each.
The uses to which these different forms are put will be described
later.
The body of each table contains entries corresponding to
every possible combination of the allowed input parameters. Each
entry in the table actually consists of several parts. There are
fields assigned to accumulate counts of the occurrences of incidents
in which the specifying input values coincided with the different
desired outputs from the table as found during previous learning
sessions and there are fields containing the summarized results of
these learning sessions, which are used as outputs from the table.
The outputs from the tables can then express to the allowed accuracy
all possible functions of the input parameters.
13
Operation in the Training Mode
When operating in the training mode the program is supplied
with a sequence of stored utterances with accompanying phonetic
transcriptions. Each sample of the incoming speech signal is
analyzed (Fourier transforms or inverse filter equivalent) to obtain
the necessary input parameters for the lowest level tables in the
signature table hierarchy. At the same time reference is made to a
table of phonetic "hints" which prescribe the desired outputs from
each table which correspond to all possible phonemic inputs. The
signature tables are then processed.
The processing of each table is done in two steps, one
process at each entry to the table and the second only periodically.
The first process consists of locating a single entry line within the
table as specified by the inputs to the table and adding a 1 to the
appropriate field to indicate the presence of the property specified
by hint table as corresponding to the phoneme specified in the
phonemic transcription. At this time a report is also made as to the
table's output as determined from the averaged results of previous
learning so that a running record may be kept of the performance of
the system. At periodic intervals all tables are updated to
incorporate recent learning results. To make this process easily
understandable, let us restrict our attention to a table used to
identify a single significant feature, say voicing. The hint table
will identify whether or not the phoneme currently being processed is
14
to be considered voiced. If it is voiced, a 1 is added to the "yes"
field of the entry line located by the normal inputs to the table. If
it is not voiced, a 1 is added to the "no" field. At updating time
the output that this entry will subsequently report is determined by
dividing the accumulated sum in the "yes" field by the sum of the
numbers in the "yes" and the "no" fields, and reporting this quantity
as a number in the range from 0 to 7. Actually the process is a bit
more complicated than this and it varies with the exact type of table
under consideration, as reported in detail elsewhere. Outputs from
the signature tables are not probabilities, in the strict sense, but
are the statistically-arrived-at odds based on the actual learning
sequence.
The preamble of the table has space for storing twelve past
outputs. An input to a table can be delayed to that extent. This
table relates outcomes of previous events with the preset hint-the
learning input. A certain amount of context dependent learning is
thus possible with the limitation that the specified delays are
constant.
The interconnected hierarchy of tables form a network which
runs incrementally, in steps synchronous with the time window over
which the input signal is analyzed. The present window width is set
at 12.8 ms.(256 points at 20 K samples/sec.) with overlap of 6.4 ms.
Inputs to this network are the parameters abstracted from the
frequency analyzes of the signal, and the specified hint. The
15
outputs of the network could be either the probability attached to
every phonetic symbol or the output of a table associated with a
feature such as voiced, vowel etc. The point to be made is that the
output generated for a sample is essentially independent of its
contiguous samples. The dependency achieved by using delays in the
inputs is invisible to the outputs. The outputs thus report the best
estimate on what the current acoustic input is with no relation to
the past outputs. Relating the successive outputs along the time
dimension is realized by counters.
The Use of COUNTERS
The transition from initial sample space to segment space is
made possible by means of COUNTERS which are summed and
reinitialized whenever their inputs cross specified threshold
values, being triggered on when the input exceeds the threshold and
off when it falls below. Momentary spikes are eliminated by
specifying time hysteresis, the number of consecutive samples for
which the input must be above the threshold. The output of a
counter provides information about starting time, duration and
average input for the period it was active.
Since a counter can reference a table at any level in the
hierarchy of tables, it can reflect any desired degree of information
reduction. For example, a counter may be set up to show a section of
speech to be a vowel, a front vowel or the vowel /I/. The counters
16
can be looked upon to represent a mapping of parameter-time space
into a feature-time space, or at a higher level symbol-time space. It
may be useful to carry along the feature information as a back up in
those situations where the symbolic information is not acceptable to
syntactic or semantic interpretation.
In the same manner as the tables, the counters run
completely independent of each other. In a recognition run the
counters may overlap in arbitrary fashion, may leave out gaps where
no counter has been triggered or may not line up nicely. A properly
segmented output, where the consecutive sections are in time
sequence and are neatly labeled, is essential for processing it
further. This is achieved by registering the instants when the
counters are triggered or terminated to form time slices called
segments.
An event is the period between successive activation or
termination of any counter. An event shorter than a specified time
is merely ignored. A record of event durations and up to three
active counters, ordered according to their probability, is
maintained.
An event resulting from the processing described so far,
represents a phonette - one of the basic speech categories defined as
hints in the learning process. It is only an estimate of closeness to
a speech category , based on past learning. Also each category has a
17
more-or-less stationary spectral characterization. Thus a category
may have a phonemic equivalent as in the case of vowels , it may be
common to phoneme class as for the voiced or unvoiced stop gaps or it
may be subphonemic as a T-burst or a K-burst. The choices are based
on acoustic expediency, i.e. optimization of the learning rather than
any linguistic considerations. However higher level interpretive
programs may best operate on inputs resembling phonemic transcription.
The contiguous segments may be coalesced into phoneme like units
using diadic or triadic probabilities and acoustic-phonetic rules
particular to the system. For example, a period of silence followed
by a type of burst or a short friction may be combined to form the
corresponding stop. A short friction or a burst following a nasal or
a lateral may be called a stop even if the silence period is short or
absent. Clearly these rules must be specific to the system, based on
the confidence with which durations and phonette categories are
recognized.
18
B. SPEECH RESEARCH AT STANFORD UNIVERSITY
Efforts to establish a vocal communication link with a
digital computer have been underway at Stanford since 1963. These
efforts have been primarily concerned with four areas of research.
First, basic research in extracting phonemic and linguistic
information from speech waveforms has been persued. Second, the
application of automatic learning processes have been investigated.
Third, the use of syntax and semantics to aid speech recognition
have been explored. Finally, the application of speech recognition
systems to control other processes developed at the Artificial
Intelligence Facility have been carried out. These efforts have
been carried out in parallel with varying emphasis on particular
factors at different times. None of the facets of this research has
been solved completely. However, each limited success has provided
insight and direction which opened a wealth of challenging, state of
the art, research projects.
The fruits of Stanford's speech research program were first
seen in October 1964 when Raj Reddy published a report describing
his preliminary investigations on the analysis of speech waveforms
[1]. This report described the initial digital processes developed
for analyzing waveforms of vowels and consonants, fundamental
frequency, and formants. These processes were used as the basis
for a simple vowel recognition system and synthesis of sounds.
19
By 1966 Reddy had built a much larger system which obtained a
phonemic transcription and which achieved segmentation of connected
phrases utilizing hypotheses testing [2]. This system represented a
significant contribution towards speech sound segmentation [3]. This
system operated on a subset of the speech of a single cooperative
speaker.
In 1967 Reddy and his students had refined several of his
processes and published papers on phoneme grouping for speech
recognition [4], pitch period determination of speech sounds [5],
and computer recognition of connected speech [6]. At this time Reddy
was considering the introduction of learning into his processes at
several stages. He was also supervising several related student
projects including limited vocabulary speech recognition, a phoneme
string to word string transcription program, a syllable junction
program, and telephone speech recognition.
1968 was an extremely productive year for Professor Reddy
and his speech group. Pierre Vicens published a report on
preprocessing for speech analysis [7]; Reddy published a paper on
the computer transcription of phonemic symbols [8]; Reddy and Ann
Robinson published a paper on phoneme-to-grapheme translation of
English [9]; Reddy and Vicens published a paper on procedures for
segmentation of connected speech [10]; and Reddy presented a paper
in Japan on consonantal clustering and connected speech recognition
[11]. In addition to this basic speech research, a paper by John
20
McCarthy , Lester Earnest, Raj Reddy, and Pierre Vicens was
presented at the 1968 Fall Joint Computer Conference entitled "A
Computer With Hands, Eyes, and Ears" which, in part, described
the vocal control of the artificial arm developed at Stanford [12].
By 1969 the Stanford developed speech processes were
successfully segmenting and parsing continuous utterances from a
restricted syntax. Pierre Vicens produced a report on aspects of
speech recognition by computer which investigated the techniques and
methodologies which are useful in achieving close to real-time
recognition of speech [13]. In March of 1969, Raj Reddy, Dave
Espar, and Art Eisenson produced a 16mm color movie with sound
entitled "Hear Here". This film described the state of the speech
recognition project as of Spring, 1969. In addition, Raj Reddy
completed a report on the use of environmental, syntactic, and
probabilistic constraints in vision and speech [14] and Reddy and
R.B. Neely reported their research on the contextual analysis of
phonemes of English [15].
In 1970, a paper was presented by Raj Reddy, L. D. Erman,
and R. B. Neely concerning the speech recognition project at the
IEEE Systems Science and Cybernetics Conference. At this time
Professor Reddy left Stanford to join the faculty of Carnegie-Mellon
University and Dr. Arthur Samuel became the head of the Stanford
speech research efforts. Dr. Samuel was the developer of an
extremely successful machine learning scheme which had previously
21
been applied to the game of checkers [16],[17]. He resolved to
apply them to speech recognition.
By 1971 the first report on a speech recognition system
utilizing Samuel's learning scheme was written by George White [18].
This report was primarily concerned with the examination of the
properties of signature trees and the heuristics involved in their
application to an optimal minimal set of features to achieve
recognition. Also at this time, M.M. Astrahan produced a report
describing his research on speech analysis by clustering, or the
hyperphoneme method [19]. This process attempted to do speech
recognition by mathematical classifications instead of the
traditional phonemes or linguistic categories. This was
accomplished by nearest-neighbor classification in a hyperspace
wherein cluster centers, or hyperphonemes, had been established.
In 1972 R.B.Thosar and A.L. Samuel presented a report
concerning some preliminary experiments in speech recognition using
signature tables [20]. This approach represented a general attack
on speech recognition employing learning mechanisms at each stage of
classification.
The speech effort in 1973 has been devoted to two areas.
First, a mathematically rigorous examination and improvement of the
signature table learning mechanism has been accomplished by R.B.
Thosar. Second, a segmentation scheme based on signature tables is
22
being developed to provide accurate segmentation together with
probabilities or confidence values for the most likely phoneme
occuring during each segment. This process attempts to extract as
much information about an acoustic signal as possible and to pass
this information to higher level processes. The preliminary results
of this segmentation scheme will be presented at the speech
segmentation workshop to be held in July at Carnegie-Mellon
University. In addition to these activities, a new, high speed pitch
detection scheme has been developed by J. A. Moorer and has been
submitted for publication [22].
23
C. BIBLIOGRAPHY
1. D. Raj Reddy, "Experiments on Automatic Speech Recognition by a
Digital Computer", AIM-26, October 1964, 19 pages.
2. D. Raj Reddy", An Approach to Computer Speech Recognition by
Direct Analysis of the Speech Waveform", AIM-43, September 1966, 144
pages.
3. D. Raj Reddy, "Segmentation of Speech Sounds," J. Acoust. Soc.
Amer., August 1966.
4. D. Raj Reddy, "Phoneme Grouping for Speech Recognition," J.
Acoust. Soc. Amer., May, 1967.
5. D. Raj Reddy, "Pitch Period Determination of Speech Sounds,"
Comm. ACM, June, 1967.
6. D. Raj Reddy, "Computer Recognition of Connected Speech," J.
Acoust. Soc. Amer., August, 1967.
7. Pierre Vicens, "Preprocessing for Speech Analysis", AIM-71,
October 1968, 33 pages.
8. D. Raj Reddy, "Computer Transcription of Phonemic Symbols", J.
Acoust. Soc. Amer., August 1968.
9. D. Raj Reddy, and Ann Robinson, "Phoneme-To-Grapheme Translation
of English", IEEE Trans. Audio and Electroacoustics, June 1968.
10. D. Raj Reddy, and P. Vicens, "Procedures for Segmentation of
Connected Speech," J. Audio Eng. Soc., October 1968.
11. D. Raj Reddy, "Consonantal Clustering and Connected Speech
Recognition", Proc. Sixth International Congress of Acoustics, Vol. 2,
pp. C-57 to C-60, Tokyo, 1968.
12. John McCarthy, Lester Earnest, D. Raj Reddy, and Pierre Vicens,
"A Computer With Hands, Eyes, and Ears", Proceedings of the Fall
Joint Computer Conference, 1968. 13. Pierre Vicens, Aspects of
Speech Recognition by Computer, AIM-85, April 1969, 210 pages.
14. D. Raj Reddy, "On the Use of Environmental, Syntactic and
Probabilistic Constraints in Vision and Speech", AIM-78, January
1969, 23 pages.
15. D. Raj Reddy and R. B. Neely, "Contextual Analysis of Phonemes
of English", AIM-79, January 1969, 71 pages.
16. A. L. Samuel, "Some Studies in Machine Learning Using the Game
of Checkers," IBM Journal 3, 211-229 (1959).
24
17. A. L. Samuel, "Some Studies in Machine Learning Using the Game
of Checkers, II - Recent Progress," IBM Jour. of Res. and Dev., 11,
pp. 601-617.
18. George M. White, "Machine Learning Through Signature Trees.
Applications to Human Speech", AIM-136, October 1870, 40 pages.
19. M. M. Astrahan, "Speech Analysis by Clustering, or the Hyper-
phoneme Method", AIM-124, June 1970, 22 pages.
20. R. B. Thosar and A. L. Samuel, "Some Preliminary Experiments in
Speech Recognition Using Signature Table Learning", ARPA Speech
Understanding Research Group Note 43.
21. R. B. Thosar, "Estimation of Probability Densities using
Signature Tables for Application to Pattern Recognition", ARPA Speech
Understanding Research Group Note 81.
22. J. A. Moorer, "The Optimum-Comb Method of Pitch Period Analysis
in Speech", AIM-207, July 1973.
25
D. COGNIZANT PERSONNEL
For contractual matters:
Sponsored Projects Office
Stanford University
Stanford, California 94305
Telephone: (415) 321-2300, ext. 2883
For technical and scientific matters regarding this proposal:
Arthur L. Samuel
Computer Science Department
Stanford University
Stanford, California 94305
Telephone: (415) 321-2300, ext. 3330
For administrative matters, including questions relating
to the budget or property acquisition:
Mr. Lester D. Earnest
Computer Science Department
Stanford University
Stanford, California 94305
Telephone: (415) 321-2300, ext. 4202
26